Overview

Dataset statistics

Number of variables13
Number of observations200000
Missing cells260000
Missing cells (%)10.0%
Duplicate rows7644
Duplicate rows (%)3.8%
Total size in memory19.8 MiB
Average record size in memory104.0 B

Variable types

Numeric9
Categorical2
Boolean2

Alerts

Dataset has 7644 (3.8%) duplicate rowsDuplicates
Height is highly overall correlated with BMIHigh correlation
Weight is highly overall correlated with BMIHigh correlation
BMI is highly overall correlated with Height and 1 other fieldsHigh correlation
Diabetes is highly imbalanced (53.1%)Imbalance
Student ID has 20000 (10.0%) missing valuesMissing
Age has 20000 (10.0%) missing valuesMissing
Gender has 20000 (10.0%) missing valuesMissing
Height has 20000 (10.0%) missing valuesMissing
Weight has 20000 (10.0%) missing valuesMissing
Blood Type has 20000 (10.0%) missing valuesMissing
BMI has 20000 (10.0%) missing valuesMissing
Temperature has 20000 (10.0%) missing valuesMissing
Heart Rate has 20000 (10.0%) missing valuesMissing
Blood Pressure has 20000 (10.0%) missing valuesMissing
Cholesterol has 20000 (10.0%) missing valuesMissing
Diabetes has 20000 (10.0%) missing valuesMissing
Smoking has 20000 (10.0%) missing valuesMissing
Student ID is uniformly distributedUniform

Reproduction

Analysis started2023-09-20 09:42:46.623470
Analysis finished2023-09-20 09:43:03.786472
Duration17.16 seconds
Software versionydata-profiling vv4.4.0
Download configurationconfig.json

Variables

Student ID
Real number (ℝ)

MISSING  UNIFORM 

Distinct98976
Distinct (%)55.0%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean49974.042
Minimum1
Maximum100000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:03.878044image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4985.95
Q124971.75
median49943.5
Q374986
95-th percentile94984
Maximum100000
Range99999
Interquartile range (IQR)50014.25

Descriptive statistics

Standard deviation28879.642
Coefficient of variation (CV)0.57789285
Kurtosis-1.2008208
Mean49974.042
Median Absolute Deviation (MAD)25007
Skewness0.0010832903
Sum8.9953276 × 109
Variance8.340337 × 108
MonotonicityNot monotonic
2023-09-20T15:13:04.012618image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
54928 2
 
< 0.1%
60423 2
 
< 0.1%
60431 2
 
< 0.1%
60430 2
 
< 0.1%
60429 2
 
< 0.1%
60428 2
 
< 0.1%
60427 2
 
< 0.1%
60426 2
 
< 0.1%
60424 2
 
< 0.1%
93957 2
 
< 0.1%
Other values (98966) 179980
90.0%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
1 1
< 0.1%
2 2
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 2
< 0.1%
6 2
< 0.1%
7 2
< 0.1%
8 2
< 0.1%
9 2
< 0.1%
10 2
< 0.1%
ValueCountFrequency (%)
100000 2
< 0.1%
99999 2
< 0.1%
99998 2
< 0.1%
99997 1
< 0.1%
99995 2
< 0.1%
99994 1
< 0.1%
99993 2
< 0.1%
99992 2
< 0.1%
99991 2
< 0.1%
99990 2
< 0.1%

Age
Real number (ℝ)

MISSING 

Distinct17
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean26.021561
Minimum18
Maximum34
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:04.120200image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile18
Q122
median26
Q330
95-th percentile34
Maximum34
Range16
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.8905278
Coefficient of variation (CV)0.18794137
Kurtosis-1.2038811
Mean26.021561
Median Absolute Deviation (MAD)4
Skewness-0.0033658352
Sum4683881
Variance23.917262
MonotonicityNot monotonic
2023-09-20T15:13:04.217808image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
28 10882
 
5.4%
27 10755
 
5.4%
33 10703
 
5.4%
22 10691
 
5.3%
25 10683
 
5.3%
21 10677
 
5.3%
29 10676
 
5.3%
34 10660
 
5.3%
24 10600
 
5.3%
20 10566
 
5.3%
Other values (7) 73107
36.6%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
18 10383
5.2%
19 10413
5.2%
20 10566
5.3%
21 10677
5.3%
22 10691
5.3%
23 10335
5.2%
24 10600
5.3%
25 10683
5.3%
26 10486
5.2%
27 10755
5.4%
ValueCountFrequency (%)
34 10660
5.3%
33 10703
5.4%
32 10510
5.3%
31 10541
5.3%
30 10439
5.2%
29 10676
5.3%
28 10882
5.4%
27 10755
5.4%
26 10486
5.2%
25 10683
5.3%

Gender
Categorical

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Memory size1.5 MiB
Male
90005 
Female
89995 

Length

Max length6
Median length4
Mean length4.9999444
Min length4

Characters and Unicode

Total characters899990
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowMale
3rd rowFemale
4th rowMale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male 90005
45.0%
Female 89995
45.0%
(Missing) 20000
 
10.0%

Length

2023-09-20T15:13:04.337388image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-20T15:13:04.471057image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
male 90005
50.0%
female 89995
50.0%

Most occurring characters

ValueCountFrequency (%)
e 269995
30.0%
a 180000
20.0%
l 180000
20.0%
M 90005
 
10.0%
F 89995
 
10.0%
m 89995
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 719990
80.0%
Uppercase Letter 180000
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 269995
37.5%
a 180000
25.0%
l 180000
25.0%
m 89995
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
M 90005
50.0%
F 89995
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 899990
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 269995
30.0%
a 180000
20.0%
l 180000
20.0%
M 90005
 
10.0%
F 89995
 
10.0%
m 89995
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 899990
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 269995
30.0%
a 180000
20.0%
l 180000
20.0%
M 90005
 
10.0%
F 89995
 
10.0%
m 89995
 
10.0%

Height
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct98992
Distinct (%)55.0%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean174.9471
Minimum150.00004
Maximum199.99864
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:04.581229image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum150.00004
5-th percentile152.4158
Q1162.47611
median174.89991
Q3187.46442
95-th percentile197.45481
Maximum199.99864
Range49.998597
Interquartile range (IQR)24.988307

Descriptive statistics

Standard deviation14.44756
Coefficient of variation (CV)0.082582446
Kurtosis-1.2008886
Mean174.9471
Median Absolute Deviation (MAD)12.506777
Skewness0.0022729101
Sum31490478
Variance208.73198
MonotonicityNot monotonic
2023-09-20T15:13:04.714831image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
161.7779242 2
 
< 0.1%
185.0364512 2
 
< 0.1%
190.8724165 2
 
< 0.1%
154.9205933 2
 
< 0.1%
171.1549261 2
 
< 0.1%
196.1785543 2
 
< 0.1%
181.9949274 2
 
< 0.1%
198.5948733 2
 
< 0.1%
155.4997801 2
 
< 0.1%
174.7993486 2
 
< 0.1%
Other values (98982) 179980
90.0%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
150.0000414 1
< 0.1%
150.0003289 2
< 0.1%
150.0008757 1
< 0.1%
150.0009957 2
< 0.1%
150.0021254 2
< 0.1%
150.004363 2
< 0.1%
150.0043832 2
< 0.1%
150.0053413 2
< 0.1%
150.0070718 2
< 0.1%
150.0070791 2
< 0.1%
ValueCountFrequency (%)
199.9986387 1
< 0.1%
199.9979397 2
< 0.1%
199.9969655 1
< 0.1%
199.9968342 2
< 0.1%
199.9967708 2
< 0.1%
199.9954615 1
< 0.1%
199.99526 2
< 0.1%
199.9946035 2
< 0.1%
199.9935183 2
< 0.1%
199.993175 2
< 0.1%

Weight
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct99026
Distinct (%)55.0%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean69.971585
Minimum40.000578
Maximum99.999907
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:05.098379image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum40.000578
5-th percentile42.950006
Q154.969838
median69.979384
Q384.980097
95-th percentile97.000203
Maximum99.999907
Range59.999329
Interquartile range (IQR)30.01026

Descriptive statistics

Standard deviation17.322574
Coefficient of variation (CV)0.24756584
Kurtosis-1.2009818
Mean69.971585
Median Absolute Deviation (MAD)15.006275
Skewness0.0048199774
Sum12594885
Variance300.07158
MonotonicityNot monotonic
2023-09-20T15:13:05.226986image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
72.35494708 2
 
< 0.1%
89.94904672 2
 
< 0.1%
91.20816557 2
 
< 0.1%
62.32576927 2
 
< 0.1%
62.57737222 2
 
< 0.1%
57.13374418 2
 
< 0.1%
47.55671987 2
 
< 0.1%
51.31386665 2
 
< 0.1%
49.63270021 2
 
< 0.1%
84.1466654 2
 
< 0.1%
Other values (99016) 179980
90.0%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
40.00057777 2
< 0.1%
40.00071806 2
< 0.1%
40.00167958 2
< 0.1%
40.00211019 2
< 0.1%
40.00278931 2
< 0.1%
40.0033142 2
< 0.1%
40.0043693 2
< 0.1%
40.00469245 2
< 0.1%
40.00501204 2
< 0.1%
40.00506022 2
< 0.1%
ValueCountFrequency (%)
99.99990661 1
< 0.1%
99.99945951 2
< 0.1%
99.99872464 2
< 0.1%
99.99766773 2
< 0.1%
99.99710962 2
< 0.1%
99.99708677 1
< 0.1%
99.99698818 2
< 0.1%
99.9958793 2
< 0.1%
99.99574444 2
< 0.1%
99.9955893 2
< 0.1%

Blood Type
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Memory size1.5 MiB
B
45537 
O
45511 
AB
44486 
A
44466 

Length

Max length2
Median length1
Mean length1.2471444
Min length1

Characters and Unicode

Total characters224486
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowO
2nd rowB
3rd rowA
4th rowB
5th rowO

Common Values

ValueCountFrequency (%)
B 45537
22.8%
O 45511
22.8%
AB 44486
22.2%
A 44466
22.2%
(Missing) 20000
10.0%

Length

2023-09-20T15:13:05.353565image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-09-20T15:13:05.478781image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
b 45537
25.3%
o 45511
25.3%
ab 44486
24.7%
a 44466
24.7%

Most occurring characters

ValueCountFrequency (%)
B 90023
40.1%
A 88952
39.6%
O 45511
20.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 224486
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 90023
40.1%
A 88952
39.6%
O 45511
20.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 224486
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 90023
40.1%
A 88952
39.6%
O 45511
20.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 224486
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 90023
40.1%
A 88952
39.6%
O 45511
20.3%

BMI
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct98983
Distinct (%)55.0%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean23.338869
Minimum10.074837
Maximum44.355113
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:05.601424image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum10.074837
5-th percentile13.077572
Q117.858396
median22.671401
Q327.997487
95-th percentile36.309183
Maximum44.355113
Range34.280276
Interquartile range (IQR)10.139092

Descriptive statistics

Standard deviation7.0335537
Coefficient of variation (CV)0.30136651
Kurtosis-0.42473137
Mean23.338869
Median Absolute Deviation (MAD)5.0210732
Skewness0.43835089
Sum4200996.5
Variance49.470878
MonotonicityNot monotonic
2023-09-20T15:13:05.731995image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27.64583507 2
 
< 0.1%
21.31288797 2
 
< 0.1%
17.88748454 2
 
< 0.1%
17.1649642 2
 
< 0.1%
24.57522201 2
 
< 0.1%
17.08301526 2
 
< 0.1%
14.97046433 2
 
< 0.1%
20.59578432 2
 
< 0.1%
35.26719381 2
 
< 0.1%
28.65053448 2
 
< 0.1%
Other values (98973) 179980
90.0%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
10.07483709 2
< 0.1%
10.0814309 2
< 0.1%
10.0901314 2
< 0.1%
10.10207858 2
< 0.1%
10.11272071 2
< 0.1%
10.12641305 2
< 0.1%
10.12673379 1
< 0.1%
10.13625472 2
< 0.1%
10.13677254 2
< 0.1%
10.13936456 2
< 0.1%
ValueCountFrequency (%)
44.3551126 1
< 0.1%
44.31407361 2
< 0.1%
44.28800321 2
< 0.1%
44.19402058 2
< 0.1%
44.17538675 1
< 0.1%
44.15754669 2
< 0.1%
44.09093573 2
< 0.1%
44.07686164 2
< 0.1%
44.07419486 2
< 0.1%
44.0514756 2
< 0.1%

Temperature
Real number (ℝ)

MISSING 

Distinct99006
Distinct (%)55.0%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean98.600948
Minimum96.397835
Maximum100.82486
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:05.873673image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum96.397835
5-th percentile97.778561
Q198.26475
median98.599654
Q398.940543
95-th percentile99.426585
Maximum100.82486
Range4.4270217
Interquartile range (IQR)0.67579279

Descriptive statistics

Standard deviation0.50053017
Coefficient of variation (CV)0.0050763221
Kurtosis0.0069486449
Mean98.600948
Median Absolute Deviation (MAD)0.33804288
Skewness0.010467187
Sum17748171
Variance0.25053045
MonotonicityNot monotonic
2023-09-20T15:13:06.000927image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
98.89844658 2
 
< 0.1%
98.28320066 2
 
< 0.1%
98.058989 2
 
< 0.1%
98.08941353 2
 
< 0.1%
98.74886556 2
 
< 0.1%
98.16848078 2
 
< 0.1%
98.60092163 2
 
< 0.1%
97.99633484 2
 
< 0.1%
98.51004203 2
 
< 0.1%
98.58211416 2
 
< 0.1%
Other values (98996) 179980
90.0%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
96.3978355 2
< 0.1%
96.59628951 2
< 0.1%
96.6095813 1
< 0.1%
96.64100649 2
< 0.1%
96.69389468 2
< 0.1%
96.75459508 2
< 0.1%
96.75597703 2
< 0.1%
96.76091358 2
< 0.1%
96.78533653 2
< 0.1%
96.81438808 2
< 0.1%
ValueCountFrequency (%)
100.8248572 2
< 0.1%
100.7737647 2
< 0.1%
100.7456864 2
< 0.1%
100.6533907 2
< 0.1%
100.6128156 1
< 0.1%
100.5880979 2
< 0.1%
100.5874788 2
< 0.1%
100.5761755 2
< 0.1%
100.5664977 2
< 0.1%
100.5348293 2
< 0.1%

Heart Rate
Real number (ℝ)

MISSING 

Distinct40
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean79.503767
Minimum60
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:06.128534image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum60
5-th percentile61
Q170
median80
Q390
95-th percentile97
Maximum99
Range39
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.540755
Coefficient of variation (CV)0.14515985
Kurtosis-1.1981236
Mean79.503767
Median Absolute Deviation (MAD)10
Skewness-0.0010632541
Sum14310678
Variance133.18903
MonotonicityNot monotonic
2023-09-20T15:13:06.251755image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
77 4679
 
2.3%
97 4613
 
2.3%
63 4603
 
2.3%
92 4597
 
2.3%
73 4588
 
2.3%
70 4579
 
2.3%
74 4576
 
2.3%
71 4573
 
2.3%
61 4571
 
2.3%
88 4561
 
2.3%
Other values (30) 134060
67.0%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
60 4517
2.3%
61 4571
2.3%
62 4453
2.2%
63 4603
2.3%
64 4544
2.3%
65 4425
2.2%
66 4400
2.2%
67 4269
2.1%
68 4560
2.3%
69 4317
2.2%
ValueCountFrequency (%)
99 4478
2.2%
98 4513
2.3%
97 4613
2.3%
96 4430
2.2%
95 4451
2.2%
94 4471
2.2%
93 4492
2.2%
92 4597
2.3%
91 4506
2.3%
90 4504
2.3%

Blood Pressure
Real number (ℝ)

MISSING 

Distinct50
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean114.55803
Minimum90
Maximum139
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:06.385969image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum90
5-th percentile92
Q1102
median115
Q3127
95-th percentile137
Maximum139
Range49
Interquartile range (IQR)25

Descriptive statistics

Standard deviation14.403353
Coefficient of variation (CV)0.12572975
Kurtosis-1.1954515
Mean114.55803
Median Absolute Deviation (MAD)12
Skewness-0.0048300908
Sum20620446
Variance207.45658
MonotonicityNot monotonic
2023-09-20T15:13:06.520179image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
106 3823
 
1.9%
117 3741
 
1.9%
109 3724
 
1.9%
135 3714
 
1.9%
97 3709
 
1.9%
91 3703
 
1.9%
131 3697
 
1.8%
136 3697
 
1.8%
116 3696
 
1.8%
128 3682
 
1.8%
Other values (40) 142814
71.4%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
90 3520
1.8%
91 3703
1.9%
92 3539
1.8%
93 3507
1.8%
94 3551
1.8%
95 3451
1.7%
96 3586
1.8%
97 3709
1.9%
98 3531
1.8%
99 3552
1.8%
ValueCountFrequency (%)
139 3584
1.8%
138 3630
1.8%
137 3431
1.7%
136 3697
1.8%
135 3714
1.9%
134 3601
1.8%
133 3599
1.8%
132 3566
1.8%
131 3697
1.8%
130 3580
1.8%

Cholesterol
Real number (ℝ)

MISSING 

Distinct130
Distinct (%)0.1%
Missing20000
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean184.48636
Minimum120
Maximum249
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2023-09-20T15:13:06.648776image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum120
5-th percentile126
Q1152
median184
Q3217
95-th percentile243
Maximum249
Range129
Interquartile range (IQR)65

Descriptive statistics

Standard deviation37.559678
Coefficient of variation (CV)0.20359054
Kurtosis-1.2024382
Mean184.48636
Median Absolute Deviation (MAD)32
Skewness0.0041550445
Sum33207545
Variance1410.7294
MonotonicityNot monotonic
2023-09-20T15:13:06.782908image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
223 1534
 
0.8%
155 1501
 
0.8%
211 1478
 
0.7%
215 1476
 
0.7%
249 1473
 
0.7%
157 1457
 
0.7%
245 1455
 
0.7%
127 1453
 
0.7%
150 1451
 
0.7%
161 1450
 
0.7%
Other values (120) 165272
82.6%
(Missing) 20000
 
10.0%
ValueCountFrequency (%)
120 1400
0.7%
121 1441
0.7%
122 1373
0.7%
123 1421
0.7%
124 1310
0.7%
125 1365
0.7%
126 1384
0.7%
127 1453
0.7%
128 1328
0.7%
129 1377
0.7%
ValueCountFrequency (%)
249 1473
0.7%
248 1343
0.7%
247 1317
0.7%
246 1358
0.7%
245 1455
0.7%
244 1406
0.7%
243 1422
0.7%
242 1402
0.7%
241 1387
0.7%
240 1401
0.7%

Diabetes
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Memory size390.8 KiB
False
161986 
True
18014 
(Missing)
20000 
ValueCountFrequency (%)
False 161986
81.0%
True 18014
 
9.0%
(Missing) 20000
 
10.0%
2023-09-20T15:13:06.915522image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Smoking
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing20000
Missing (%)10.0%
Memory size390.8 KiB
False
143971 
True
36029 
(Missing)
20000 
ValueCountFrequency (%)
False 143971
72.0%
True 36029
 
18.0%
(Missing) 20000
 
10.0%
2023-09-20T15:13:07.026183image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Interactions

2023-09-20T15:13:01.187150image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:51.483394image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:52.806335image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.004991image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.170220image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.370825image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.592235image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.763214image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.008722image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.317751image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:51.615049image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:52.947974image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.128600image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.299357image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.502007image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.717780image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.907333image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.138351image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.456313image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:51.748662image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.085086image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.266812image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.440486image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.642180image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.858391image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.066944image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.276695image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.586472image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:51.871873image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.211730image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.390392image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.571766image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.770676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.982574image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.196709image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.403246image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.720087image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:51.998469image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.343303image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.518025image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.700896image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.907771image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.113153image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.331461image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.532851image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.855723image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:52.129075image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.477953image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.654673image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.838071image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.045093image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.246754image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.470148image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.667493image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.982875image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:52.248674image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.604558image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.776332image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.964435image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.174274image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.370933image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.598728image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.790221image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:02.123481image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:52.383377image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.743167image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:54.912942image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.105039image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.315629image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.506962image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.737337image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:00.930314image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:02.255086image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:52.677750image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:53.871357image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:55.041100image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:56.236617image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:57.457132image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:58.634568image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:12:59.869140image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-09-20T15:13:01.056934image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-09-20T15:13:07.117643image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Student IDAgeHeightWeightBMITemperatureHeart RateBlood PressureCholesterolGenderBlood TypeDiabetesSmoking
Student ID1.0000.0000.003-0.001-0.0030.0040.0040.002-0.0030.0110.0070.0090.003
Age0.0001.0000.0020.0050.0030.0030.005-0.004-0.0010.0090.0120.0050.010
Height0.0030.0021.000-0.000-0.524-0.0050.0060.0040.0020.0080.0070.0110.006
Weight-0.0010.005-0.0001.0000.8410.000-0.0000.002-0.0020.0060.0060.0060.007
BMI-0.0030.003-0.5240.8411.0000.001-0.003-0.001-0.0030.0040.0060.0080.005
Temperature0.0040.003-0.0050.0000.0011.000-0.005-0.004-0.0000.0000.0070.0060.005
Heart Rate0.0040.0050.006-0.000-0.003-0.0051.0000.0030.0050.0090.0050.0120.003
Blood Pressure0.002-0.0040.0040.002-0.001-0.0040.0031.0000.0030.0090.0030.0050.005
Cholesterol-0.003-0.0010.002-0.002-0.003-0.0000.0050.0031.0000.0040.0060.0050.003
Gender0.0110.0090.0080.0060.0040.0000.0090.0090.0041.0000.0060.0020.002
Blood Type0.0070.0120.0070.0060.0060.0070.0050.0030.0060.0061.0000.0000.010
Diabetes0.0090.0050.0110.0060.0080.0060.0120.0050.0050.0020.0001.0000.000
Smoking0.0030.0100.0060.0070.0050.0050.0030.0050.0030.0020.0100.0001.000

Missing values

2023-09-20T15:13:02.425281image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-09-20T15:13:02.743090image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-09-20T15:13:03.519739image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Student IDAgeGenderHeightWeightBlood TypeBMITemperatureHeart RateBlood PressureCholesterolDiabetesSmoking
01.018.0Female161.77792472.354947O27.645835NaN95.0109.0203.0NoNaN
12.0NaNMale152.06915747.630941BNaN98.71497793.0104.0163.0NoNo
23.032.0Female182.53766455.741083A16.72901798.26029376.0130.0216.0YesNo
3NaN30.0Male182.11286763.332207B19.09604298.83960599.0112.0141.0NoYes
45.023.0FemaleNaN46.234173ONaN98.48000895.0NaN231.0NoNo
56.032.0NaN151.49129468.647805B29.91240399.66837370.0128.0183.0NaNYes
67.021.0NaN172.94970448.102744AB16.08163597.71546966.0134.0247.0NoNo
78.028.0Male186.48940252.389752AB15.06392198.22778885.0123.0128.0NoNo
89.021.0Male155.03967842.958703BNaN98.808053NaN111.0243.0NoNo
910.032.0NaN170.83631550.783250B17.40043598.57016861.094.0166.0NaNNo
Student IDAgeGenderHeightWeightBlood TypeBMITemperatureHeart RateBlood PressureCholesterolDiabetesSmoking
19999099991.021.0Female183.73511051.172076AB15.15823897.99879067.096.0249.0NaNYes
19999199992.028.0Male183.499177NaNA26.52796297.32168070.0113.0140.0NoNo
19999299993.034.0Male161.59003090.877589B34.80388198.72883670.096.0208.0NoNo
19999399994.022.0MaleNaN46.155224ANaN98.33101993.0100.0NaNYesNo
19999499995.022.0Male159.486907NaNA27.63108298.97197686.0134.0208.0NoNaN
199995NaN24.0Male176.50326095.756997B30.73725499.17068565.0121.0130.0NoNo
19999699997.029.0Female163.91767545.225194NaN16.83173497.86578562.0125.0198.0NoYes
19999799998.034.0FemaleNaN99.648914NaN33.18930398.76821060.090.0154.0NaNNo
19999899999.030.0Female156.44694450.142824A20.48682398.99421261.0106.0225.0NoNo
199999100000.020.0Female153.92740999.928405O42.17518998.59581795.0133.0132.0NaNNo

Duplicate rows

Most frequently occurring

Student IDAgeGenderHeightWeightBlood TypeBMITemperatureHeart RateBlood PressureCholesterolDiabetesSmoking# duplicates
08.028.0Male186.48940252.389752AB15.06392198.22778885.0123.0128.0NoNo2
112.034.0Female182.41630276.371050AB22.95099298.11827486.097.0247.0NoNo2
219.031.0Female158.79016046.829849AB18.57272398.78470992.0102.0172.0NaNNo2
323.029.0Female179.90904190.679436AB28.01578798.78226981.0108.0227.0NoYes2
424.018.0MaleNaN52.521560AB13.57040298.21509060.0132.0217.0NoNo2
525.027.0Female187.41162381.219470AB23.12422197.73893999.0135.0123.0NoNo2
636.021.0Male183.47628761.469995O18.26010699.34692062.0127.0233.0NoNo2
752.023.0Male174.33843845.421333AB14.94423198.67202489.0101.0236.0NoNo2
879.027.0Female187.85250682.352369B23.33684398.30092178.099.0206.0NoNo2
987.034.0Female150.94263290.580214O39.75662497.56323479.0135.0198.0YesNo2